74 research outputs found
Efficient learning in Approximate Bayesian Computation
Efficient learning in Approximate Bayesian Computatio
Operator norm convergence of spectral clustering on level sets
Following Hartigan, a cluster is defined as a connected component of the
t-level set of the underlying density, i.e., the set of points for which the
density is greater than t. A clustering algorithm which combines a density
estimate with spectral clustering techniques is proposed. Our algorithm is
composed of two steps. First, a nonparametric density estimate is used to
extract the data points for which the estimated density takes a value greater
than t. Next, the extracted points are clustered based on the eigenvectors of a
graph Laplacian matrix. Under mild assumptions, we prove the almost sure
convergence in operator norm of the empirical graph Laplacian operator
associated with the algorithm. Furthermore, we give the typical behavior of the
representation of the dataset into the feature space, which establishes the
strong consistency of our proposed algorithm
The Normalized Graph Cut and Cheeger Constant: from Discrete to Continuous
Let M be a bounded domain of a Euclidian space with smooth boundary. We
relate the Cheeger constant of M and the conductance of a neighborhood graph
defined on a random sample from M. By restricting the minimization defining the
latter over a particular class of subsets, we obtain consistency (after
normalization) as the sample size increases, and show that any minimizing
sequence of subsets has a subsequence converging to a Cheeger set of M
Resampling: an improvement of Importance Sampling in varying population size models
Sequential importance sampling algorithms have been defined to estimate
likelihoods in models of ancestral population processes. However, these
algorithms are based on features of the models with constant population size,
and become inefficient when the population size varies in time, making
likelihood-based inferences difficult in many demographic situations. In this
work, we modify a previous sequential importance sampling algorithm to improve
the efficiency of the likelihood estimation. Our procedure is still based on
features of the model with constant size, but uses a resampling technique with
a new resampling probability distribution depending on the pairwise composite
likelihood. We tested our algorithm, called sequential importance sampling with
resampling (SISR) on simulated data sets under different demographic cases. In
most cases, we divided the computational cost by two for the same accuracy of
inference, in some cases even by one hundred. This study provides the first
assessment of the impact of such resampling techniques on parameter inference
using sequential importance sampling, and extends the range of situations where
likelihood inferences can be easily performed
Bayesian functional linear regression with sparse step functions
The functional linear regression model is a common tool to determine the
relationship between a scalar outcome and a functional predictor seen as a
function of time. This paper focuses on the Bayesian estimation of the support
of the coefficient function. To this aim we propose a parsimonious and adaptive
decomposition of the coefficient function as a step function, and a model
including a prior distribution that we name Bayesian functional Linear
regression with Sparse Step functions (Bliss). The aim of the method is to
recover areas of time which influences the most the outcome. A Bayes estimator
of the support is built with a specific loss function, as well as two Bayes
estimators of the coefficient function, a first one which is smooth and a
second one which is a step function. The performance of the proposed
methodology is analysed on various synthetic datasets and is illustrated on a
black P\'erigord truffle dataset to study the influence of rainfall on the
production
Approximate Bayesian Computational methods
Also known as likelihood-free methods, approximate Bayesian computational
(ABC) methods have appeared in the past ten years as the most satisfactory
approach to untractable likelihood problems, first in genetics then in a
broader spectrum of applications. However, these methods suffer to some degree
from calibration difficulties that make them rather volatile in their
implementation and thus render them suspicious to the users of more traditional
Monte Carlo methods. In this survey, we study the various improvements and
extensions made to the original ABC algorithm over the recent years.Comment: 7 figure
Clustering by Estimation of Density Level Sets at a Fixed Probability
In density-based clustering methods, the clusters are defined as the connected components of the upper level sets of the underlying density . In this setting, the practitioner fixes a probability , and associates with it a threshold such that the level set has a probability with respect to the distribution induced by . This paper is devoted to the estimation of the threshold , of the level set , as well as of the number of connected components of this level set. Given a nonparametric density estimate of based on an i.i.d. -sample drawn from , we first propose a computationally simple estimate of , and we establish a concentration inequality for this estimate. Next, we consider the plug-in level set estimate , and we establish the exact convergence rate of the Lebesgue measure of the symmetric difference between and . Finally, we propose a computationally simple graph-based estimate of , which is shown to be consistent. Thus, the methodology yields a complete procedure for analyzing the grouping structure of the data, as varies over
Efficient learning in ABC algorithms
Approximate Bayesian Computation has been successfully used in population
genetics to bypass the calculation of the likelihood. These methods provide
accurate estimates of the posterior distribution by comparing the observed
dataset to a sample of datasets simulated from the model. Although
parallelization is easily achieved, computation times for ensuring a suitable
approximation quality of the posterior distribution are still high. To
alleviate the computational burden, we propose an adaptive, sequential
algorithm that runs faster than other ABC algorithms but maintains accuracy of
the approximation. This proposal relies on the sequential Monte Carlo sampler
of Del Moral et al. (2012) but is calibrated to reduce the number of
simulations from the model. The paper concludes with numerical experiments on a
toy example and on a population genetic study of Apis mellifera, where our
algorithm was shown to be faster than traditional ABC schemes
- …